Homework 3: 100 Points

This Homework focusses on Pipelining.

Please submit your answers in a document and upload it in Canvas by the due date.

Please submit by due date, otherwise there is a10% penalty per day for late submission.

Present your work neat, clear, organized, logical, and make your case; don’t leave it to our interpretation of your answer.

1. (65 Points) Consider the code segment in RISC-V:

Loop: lw x1, 0(x2) ; load X1 from address 0+x2

addiw x1, x1, 1 ; x1 = x1 + 1

sw x1, 0(x2) ; store x1 at address 0+x2

addiw x2, x2, 4 ; x2 = x2 +4

subw x4, x3, x2 ; x4 = x3 – x2

bne x4, xo -24 ; branch to loop if x4!= 0

Assume the initial value of x3 is x2 + 396.

1. (5 Points) Write the equivalent C/C++ code of the above RISC-V code.
2. (10 Points) Draw the time diagram of the RISC-V code segment for 5-stage RISC pipeline without any forwarding paths but assuming that a register read and write in the same clock cycle. Assume that the comparison is performed in ID stage and PC is updated at the end of ID cycle. Assume that branch is handled by flushing the pipeline.
3. (5 Points) How many cycles does the loop take to execute in part b?
4. (5 Points) Draw the time diagram of the RISC-V code segment for 5-stage RISC pipeline with full forwarding paths hardware including a register read and write in the same clock cycle. Assume that the comparison is performed in ID stage and PC is updated at the end of ID cycle. Assume that a branch is handled by predicting it as not taken
5. (5 Points) How many cycles does the loop take to execute in part d.
6. (5 Points) Draw the time diagram as in Part (d) except branches are handled using the delayed branching. Hint: You do not need to schedule an instruction in Delay Slot here; just show delay branching in the time diagram (by one cycle).
7. (5 Points) Based on the part (f) use the scheme of Scheduling the Branch Delay Slot and re-write the RISC-V code to take advantage of the delay slot.
8. (3 Points) Draw the time Diagram of the RISC-V code in part g
9. (3 Points) Can the Compiler Optimize the code in Part g any further?
10. (3 Points) Draw the time diagram of the RISC-V code in part i.
11. (3 Points)How many cycles does the code in part j takes to execute?
12. (3 Points) Compute the speed up obtained in Part k over Parts c and e.
13. (15 Points) Computer C360 is built with no pipelining in single cycle of 7 ns: IF 1 ns, ID 1.5 ns, EX 1 ns, MEM 2 ns, and WB 1.5 ns. Designers consider building C470 using a five stage pipeline based on C360 data.
    1. What is the clock cycle time of the C470 5-stage pipeline?
    2. If there is a stall every four instructions, what is the CPI of C470?
    3. What is the speedup of C470 over C360 assuming Parts b?
14. (10 Points) Consider the following code segment

fld f1, 0(x1)

fld f2, 0(x2)

fmult.d f3, f2, f1

fadd.d f3, f3, 1

addi x3, x3, 1

Assume the muti-functional pipeline of Figure C.30. Show the timing diagram of this instruction sequence for the pipeline without any forwarding hardware but assuming a register read and a write in the same clock cycle.

1. (10 Points) Exercise C.7a (Page C-75 of textbook)